# Uncertainty quantification Over the past years conformal prediction gained increasing attention. It allows to add uncertainty quantification around every model at the cost of just a bit of additional computation. * [Uncertainty quantification](#uncertainty-quantification) * [Conformal prediction for classification](#conformal-prediction-for-classification) * [Conformal prediction for regression](#conformal-prediction-for-regression) * [Conformal prediction for non-BlueCast models](#conformal-prediction-for-non-bluecast-models) ## Conformal prediction for classification In BlueCast we provide a model and architecture agnostic conformal prediction wrapper that allows the usage of any class that has a `predict` (regression) or `predict_proba` (classification) method. For `BlueCast` and `BlueCastRegression` conformal prediction can be used after the instance has been trained. ```sh from bluecast.blueprints.cast import BlueCast from sklearn.model_selection import train_test_split # we leave it up to the user to split off a calibration set X_train, X_calibrate, y_train, y_calibrate = train_test_split( X, y, test_size=0.33, random_state=42) automl = BlueCast( class_problem="binary", ) X_train["target"] = y automl.fit(X_train, target_col="target") # make use of calibration automl.calibrate(X_calibrate, y_calibrate) # point prediction as usual y_probs, y_classes = automl.predict(df_val) # prediction sets given a certain confidence interval alpha pred_sets = automl.predict_sets(df_val, alpha=0.05) # p-values for each class being the correct one pred_intervals = automl.predict_p_values(df_val) ``` For prediction sets BlueCast offers two validation functions to test the quality of prediction sets: ```sh from bluecast.conformal_prediction.effectiveness_nonconformity_measures import one_c, avg_c # return the percentage of sets with one label only (higher is better) one_c(pred_sets) # return the mean number of labels per prediction set (lower is better) avg_c(pred_sets) ``` Finally we can check if the prediction sets have the credibility as expected from the alpha. We ask the question: In how much percent of prediction sets do we find the true class? ```python from bluecast.conformal_prediction.evaluation import prediction_set_coverage prediction_set_coverage(y_val, pred_sets) # where y_val has not been used during training or calibration ``` ## Conformal prediction for regression The same is possible for `BlueCastRegression` instances. However `BlueCastRegression` offers the `predict_interval` function only: ```sh from bluecast.blueprints.cast_regression import BlueCastRegression from sklearn.model_selection import train_test_split # we leave it up to the user to split off a calibration set X_train, X_calibrate, y_train, y_calibrate = train_test_split( X, y, test_size=0.33, random_state=42) automl = BlueCast( class_problem="binary", ) X_train["target"] = y automl.fit(X_train, target_col="target") # make use of calibration automl.calibrate(X_calibrate, y_calibrate) # point prediction as usual y_hat = automl.predict(df_val) # prediction sets given a list of confidence intervals pred_sets = automl.predict_interval(df_val, alphas=[0.01, 0.05, 0.1]) ``` Finally we can check if the prediction intervals have the credibility as expected from the given alphas. We ask the question: In how much percent of prediction bands do we find the true value? ```python from bluecast.conformal_prediction.evaluation import prediction_interval_coverage prediction_interval_coverage(y_val, pred_intervals, alphas=[0.01, 0.05, 0.1]) ``` Furthermore we can calculate how broad the prediction intervals are: ```python from bluecast.conformal_prediction.effectiveness_nonconformity_measures import prediction_interval_spans prediction_interval_spans(pred_intervals, alphas=[0.01, 0.05, 0.1]) ``` This returns the mean band size per alpha. The variable `pred_intervals` is expected to contain columns of format f"{alpha}_low" and f"{1-alpha}_high" for each format for both functions. All of these functions are also available via the ensemble classes `BlueCastCV` and `BlueCastCVRegression`. ## Conformal prediction for non-BlueCast models Even though conformal prediction is available via all BlueCast classes, the library offers standalone wrappers to enjoy conformal prediction for any class, that has a `predict` or `predict_proba` method. This can be done even without sklearn models as long as the expected methods are available and expects one input parameter for the prediction functions. ```python from bluecast.conformal_prediction.conformal_prediction import ConformalPredictionWrapper from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression X, y = make_classification( n_samples=100, n_features=5, random_state=42, n_classes=2 ) X_train, X_calibrate, y_train, y_calibrate = train_test_split( X, y, test_size=0.2, random_state=42 ) # Train a logistic regression model model = LogisticRegression(random_state=42) model.fit(X_train, y_train) # Create a ConformalPredictionWrapper instance wrapper = ConformalPredictionWrapper(model) # Calibrate the wrapper wrapper.calibrate(X_calibrate, y_calibrate) wrapper.predict_proba(x_test) # or predict_sets ``` For regression there is also a wrapper available: ```python from bluecast.conformal_prediction.conformal_prediction_regression import ConformalPredictionRegressionWrapper from sklearn.datasets import make_regression from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression X, y = make_regression( n_samples=100, n_features=5, random_state=42, n_classes=2 ) X_train, X_calibrate, y_train, y_calibrate = train_test_split( X, y, test_size=0.2, random_state=42 ) # Train a logistic regression model model = LinearRegression(random_state=42) model.fit(X_train, y_train) # Create a ConformalPredictionWrapper instance wrapper = ConformalPredictionRegressionWrapper(model) # Calibrate the wrapper wrapper.calibrate(X_calibrate, y_calibrate) wrapper.predict_interval(x_test) ``` Some model API's don't have a `predict_proba` method. In example the native LGBM API does not have a `predict_proba` method, but will predict all class scores, if the 'metric' parameter is set to 'multi_logloss'. In this case we can just copy the 'predict' method to 'predict_proba' and use the wrapper as usual. Here we make an advanced example and pass a custom LGBM model that uses conformal prediction for pruning and pass it to the BlueCast class. ```python from bluecast.blueprints.cast import BlueCast from bluecast.conformal_prediction.evaluation import prediction_set_coverage from bluecast.conformal_prediction.conformal_prediction import ConformalPredictionWrapper import optuna import numpy as np import pandas as pd from sklearn.model_selection import train_test_split, StratifiedKFold from sklearn.metrics import accuracy_score import lightgbm as lgb from typing import Tuple from bluecast.ml_modelling.base_classes import ( PredictedClasses, # just for linting checks PredictedProbas, # just for linting checks ) optuna.logging.set_verbosity(optuna.logging.WARNING) import warnings warnings.filterwarnings("ignore") def fillna(df, strategy="zero"): if strategy == "zero": return df.fillna(0) class LGBMTuner: def __init__(self, random_state=78, tune_with_alpha=0.95): self.param = {} self.model = None self.random_state = random_state self.tune_with_alpha = tune_with_alpha def fit( self, x_train: pd.DataFrame, x_test: pd.DataFrame, y_train: pd.Series, y_test: pd.Series, ) -> None: # some re-arangement, because we do nested cv here x_train["target"], x_test["target"] = y_train, y_test X = pd.concat([x_train, x_test]).reset_index(drop=True) y = X.pop("target") y = y.astype(int) _, _ = x_train.pop("target"), x_test.pop("target") print(X.columns) # Define an Optuna objective function for hyperparameter tuning def objective(trial): param = { # TODO: Move to additional folder with pyfile "constants" (use OS absolute path) "objective": "multiclass", "metric": "multi_logloss", "num_class": y.nunique(), "num_boost_round": trial.suggest_int("num_boost_round", 50, 1000), "lambda_l1": trial.suggest_float("lambda_l1", 1e-6, 100, log=True), "lambda_l2": trial.suggest_float("lambda_l2", 1e-6, 100, log=True), "linear_lambda": trial.suggest_float("linear_lambda", 1, 100, log=True), 'max_depth': trial.suggest_int('max_depth', 2, 10), "num_leaves": trial.suggest_int("num_leaves", 2, 256), "feature_fraction": trial.suggest_float( "feature_fraction", 0.4, 1.0 ), "feature_fraction_bynode": trial.suggest_float( "feature_fraction_bynode", 0.4, 1.0 ), "bagging_fraction": trial.suggest_float( "bagging_fraction", 0.1, 1 ), "min_gain_to_split": trial.suggest_float( "min_gain_to_split", 0, 1 ), "learning_rate": trial.suggest_float( "learning_rate", 1e-3, 0.1, log=True ), "verbose": -1, } #nb_stdevs = trial.suggest_float( # "nb_stdevs", 1.5, 10 # ) stratifier = StratifiedKFold( n_splits=5, shuffle=True, random_state=self.random_state, ) fold_losses = [] coverages = [] for fn, (trn_idx, val_idx) in enumerate(stratifier.split(X, y.astype(int))): X_train, X_val = X.iloc[trn_idx], X.iloc[val_idx] y_train, y_val = y.iloc[trn_idx], y.iloc[val_idx] #X_train = fillna(X_train, strategy="zero") # remove outliers #X_train, y_train = remove_outliers(X_train, y_train, nb_stdevs) dtrain = lgb.Dataset(X_train, label=y_train) gbm = lgb.train(param, dtrain) gbm.predict_proba = gbm.predict X_val = fillna(X_val, strategy="zero") preds = gbm.predict(X_val) predicted_classes = np.asarray([np.argmax(line) for line in preds]) matthews_loss = accuracy_score(y_val, predicted_classes) fold_losses.append(matthews_loss) X_val, X_calibrate, y_val, y_calibrate = train_test_split( X_val, y_val, test_size=0.50, random_state=42) # Create a ConformalPredictionWrapper instance wrapper = ConformalPredictionWrapper(gbm) # Calibrate the wrapper wrapper.calibrate(X_calibrate, y_calibrate) pred_sets = wrapper.predict_sets(X_val, alpha=self.tune_with_alpha) coverage = prediction_set_coverage(y_val, pd.DataFrame({"prediction_set": pred_sets})) if coverage < 1-self.tune_with_alpha-0.05: print(f"Prune trial, because of missing coverage. Achieved coverage of {coverage} %. Accuracy is {matthews_loss}") return -1, 0 else: # print(f"Achieved {coverage} % of singleton sets. Matthews is {matthews_loss}") coverages.append(coverage) score = np.mean(np.asarray(fold_losses)) coverage = np.mean(np.asarray(coverages)) print(f"Achieved {coverage} % coverage. Accuracy is {score}") return score, coverage # Create an Optuna study and optimize hyperparameters sampler = optuna.samplers.TPESampler( multivariate=True, seed=self.random_state ) study = optuna.create_study(directions=["maximize", "maximize"], sampler=sampler) study.optimize( objective, n_trials=500, timeout=60 * 60 * 3, gc_after_trial=True, show_progress_bar=True ) # Get the best hyperparameters #nb_stdevs = study.best_trial.params["nb_stdevs"] #del study.best_trial.params["nb_stdevs"] self.param = max(study.best_trials, key=lambda t: t.values[0]).params self.param["objective"] = "multiclass", self.param["metric"] = "multi_logloss", self.param["num_class"] = y.nunique(), print("++++++++++++++++++++++++++++++++++++++++") print("++++++++++++++++++++++++++++++++++++++++") print(f"Best params are: {self.param}") #X = fillna(X, strategy="zero") # remove outliers #X, y = remove_outliers(X, y, nb_stdevs) dtrain = lgb.Dataset(X, label=y) self.model = lgb.train(self.param, dtrain) def predict(self, df: pd.DataFrame) -> Tuple[PredictedProbas, PredictedClasses]: predicted_probas = self.model.predict(df) predicted_classes = np.asarray([np.argmax(line) for line in predicted_probas]) return predicted_probas, predicted_classes def predict_proba(self, df: pd.DataFrame) -> PredictedProbas: predicted_probas = self.model.predict(df) return predicted_probas custom_model = LGBMTuner(tune_with_alpha=0.05) automl = BlueCast( class_problem="multiclass", ml_model=custom_model, ) ```